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DETAILED ACTION 

Response to Amendment 

1 . In response to the Office Action mailed 12/17/07, applicant has submitted an 
amendment filed 3/17/08. 

Claims 26-42 and 44-48 have been amended. New Claims 48-56 have been 

added. 

Response to Arguments 

2. Applicant's arguments with respect to claims 26-56 have been considered but are 
moot in view of the new ground(s) of rejection. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 26-36, 40-50, and 52-56, are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chang (US 2003/0171931), in view of Nguyen et al. (US 6,263,309), 
hereafter Nguyen. 

As per Claim 26, Chang teaches a standard model creating apparatus for 
creating a standard model which shows an acoustic characteristic having a specific 
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attribute and is used for speech recognition, the standard model creating apparatus 
using a model that expresses a frequency parameter showing an acoustic characteristic 
("recognition model... acoustic model... customized to a user", paragraph 1; 
"frequencies", paragraph 67), the standard model creating apparatus comprising: 

a reference model storing unit operable to store a plurality of reference models 
which are models showing an acoustic characteristic having a specific attribute 
("plurality of different cohort models", paragraph 40; "enrollment data", paragraph 39; 
"selects the speakers... closest to enrollment data", paragraph 42) 

a reference model selecting unit operable to select at least one reference model 
from among the plurality of reference models stored in said reference model storing unit 
based on usage information regarding an attribute which is an object of speech 
recognition, ("plurality of different cohort models", paragraph 40; "enrollment data", 
paragraph 39; "selects the speakers... closest to enrollment data", paragraph 42; "top N 
possible cohort models", paragraph 51; where the speaker's voice qualities are 
attributes that are targeted by speaker dependent recognizers) 

a standard model creating unit operable to create the standard model by 
calculating parameters of the standard model using parameters of the at least one 
reference model selected by said reference model selection unit ("parameters for 
possible cohorts are generated", paragraph 53; "modifies the parameters... using the 
parameters in the... cohort models", paragraph 57) 

wherein said standard model creating unit includes: a standard model structure 
determining unit operable to determine a structure of the standard model which is to be 



Application/Control Number: 10/534,869 Page 4 

Art Unit: 2626 

created ("parameters for possible cohorts are generated", paragraph 53; "modifies the 
parameters... using the parameters in the... cohort models", paragraph 57) 

an initial standard model creating unit operable to determine initial values of the 
parameters specifying the standard model whose structure has been determined 
("parameters for possible cohorts are generated", paragraph 53; "modifies the 
parameters... using the parameters in the... cohort models", paragraph 57) 

a parameter estimating unit operable to estimate and calculate the parameters of 
the standard model so as to maximize or locally maximize a probability or likelihood of 
the standard model, whose initial values have been determined, with respect to the at 
least one reference model ("parameters for possible cohorts are generated", paragraph 
53; "modifies the parameters... using the parameters in the... cohort models", paragraph 
57; "likelihood", paragraph 55) 

Chang fails to teach the standard model creating apparatus using a probability 
model that expresses a frequency parameter showing an acoustic characteristic as an 
output probability, where the reference models are probability models, where the 
parameters are statistics 

Nguyen teaches the standard model creating apparatus using a probability model 
that expresses a frequency parameter showing an acoustic characteristic as an output 
probability, where the reference models are probability models, where the parameters 
are statistics ("training speakers... speaker dependent [SD] models", col. 4, lines 38-52; 
"supervector for each speaker comprises an ordered list of parameters... corresponding 
to at least a portion of the parameters of the Hidden Markov models", col. 4, lines 53-64; 
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"new speaker... compute statistics... each sound unit", col. 5, lines 42-56; "HMM... 
observable outputs... transition probabilities", col. 3, lines 17-43; "Gaussian 
distributions... probability distribution... Gaussian function... parameter-based speech 
modeling", col. 4, lines 3-36; "Alabama female accent", col. 7, lines 5-13; where 
parameters generally include frequency parameters when analyzing speech [speech is 
acoustic]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang to include the teaching of Nguyen of the standard 
model creating apparatus using a probability model that expresses a frequency 
parameter showing an acoustic characteristic as an output probability, where the 
reference models are probability models, where the parameters are statistics, in order to 
facilitate quick speaker adaptation of models, as described by Nguyen (col. 1, lines 7- 
10; col. 1, lines 60-66). 

As per Claim 27, Chang teaches usage information creating unit operable to 
create the usage information, wherein said reference selecting unit is operable to select 
the at least one reference model from among the plurality of reference models stored in 
said reference model storing unit, based on the created usage information ("recognition 
model... acoustic model... customized to a user", paragraph 1; "enrollment data", 
paragraph 39; "selects the speakers... closest to enrollment data", paragraph 42). 
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As per Claim 28, Chang suggests wherein the standard model creating 
apparatus is connected to a terminal apparatus via a communication channel, and 
further comprises: a usage information receiving unit operable to receive the usage 
information from the terminal apparatus, wherein said reference model selecting unit is 
operable to select at least one reference model from among the plurality of reference 
models stored in said reference model storing unit, based on the received usage 
information ("input speech samples", paragraph 39; "remote computer", paragraph 34; 
"recognition model... acoustic model... customized to a user", paragraph 1; "enrollment 
data", paragraph 39; "selects the speakers... closest to enrollment data", paragraph 42; 
where the input speech receiving device can be a terminal) 

As per Claim 29, the limitations are similar to those in Claim 26, and so is 
rejected under similar rationale. 

As per Claim 30, Chang teaches wherein the specification information shows at 
least one of a type of an application program which uses the standard model, and 
specifications of an apparatus which uses the standard model ("recognition model... 
acoustic model... customized to a user", paragraph 1 ; where the user that the system is 
customized to is a specification of a speaker dependent apparatus) 



As per Claim 31 , Chang teaches wherein the attribute includes information 
regarding at least one of an age, gender, a texture of a speaker's voice, a tone of voice 
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changed with emotions or health condition, a speaking rate, civility in utterance, a 
dialect, a type of background noise, loudness of background noise, an S/N ratio 
between speech and background noise, a microphone quality, and a degree of 
complexity in recognizable vocabulary ("recognition model... acoustic model... 
customized to a user", paragraph 1 ; customizing a model to be speaker dependent 
requires adapting to a speaker's voice qualities, including a texture of the voice and the 
gender). 

As per Claim 32, Chang suggests a specification information holding unit 
operable to store an application/specifications correspondence database showing a 
correspondence between an application program which uses the standard model and 
specifications of the standard model, wherein said standard model structure determining 
unit is operable to read specifications corresponding to an application program to be 
activated from the application/specifications correspondence database held by said 
specification information holding unit, and to determine the structure of the standard 
model based on the read specifications ("recognition model... acoustic model... 
customized to a user", paragraph 1; "parameters for possible cohorts are generated", 
paragraph 53; "modifies the parameters... using the parameters in the... cohort models", 
paragraph 57; where it is known that computers have multiple users, and so it is 
obvious to store some sort of correspondence between the different users and their 
respective models, and to use the correct model when a corresponding user desires to 
use the recognition system). 
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As per Claim 33, Chang teaches a specification information creating unit 
operable to create the specification information, wherein said model structure 
determining unit is operable to determine the structure of the standard model based on 
the created specification information ("recognition model... acoustic model... 
customized to a user", paragraph 1 ; where knowledge of the user who the model is to 
be adapted for is obvious to indicate to the system). 

As per Claim 34, the limitations are similar to those in Claim 28, and so is 
rejected under similar rationale. 

As per Claim 35, Chang fails to teach wherein the at least one reference model 
and the standard model are expressed using at least one Gaussian distribution, and 
said standard model structure determining unit is operable to determine at least a 
number of Gaussian mixture distributions as the structure of the standard model. 

Nguyen suggests wherein the at least one reference model and the standard 
model are expressed using at least one Gaussian distribution, and said standard model 
structure determining unit is operable to determine at least a number of Gaussian 
mixture distributions as the structure of the standard model ("training speakers... 
speaker dependent [SD] models", col. 4, lines 38-52; "supervector for each speaker 
comprises an ordered list of parameters... corresponding to at least a portion of the 
parameters of the Hidden Markov models", col. 4, lines 53-64; "new speaker... compute 
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statistics... each sound unit", col. 5, lines 42-56; "HMM... observable outputs... 
transition probabilities", col. 3, lines 17-43; "Gaussian distributions... probability 
distribution... Gaussian function... parameter-based speech modeling", col. 4, lines 3- 
36; "Alabama female accent", col. 7, lines 5-13; where Nguyen teaches the use of 
Gaussians in hidden markov models and so suggests where the Gaussians are what 
are adapted in a model) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang to include the teaching of Nguyen of the standard 
model creating apparatus using a probability model that expresses a frequency 
parameter showing an acoustic characteristic as an output probability, where the 
reference models are probability models, where the parameters are statistics, in order to 
facilitate quick speaker adaptation of models, as described by Nguyen (col. 1, lines 7- 
10; col. 1, lines 60-66). 

As per Claim 36, the limitations are similar to those in Claims 26 and 35, and so 
is rejected under similar rationale. 

As per Claim 40 and 52, the limitations are similar to those in Claim 26, and so is 
rejected under similar rationale (where the class the set of users that sound similar to 
the user). 
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As per Claim 41 , Chang teaches wherein said initial model creating unit is 
operable to specify the class ID from the at least one reference model and to determine 
initial values associated with the specified ID as the initial values ("plurality of different 
cohort models", paragraph 40; "enrollment data", paragraph 39; "selects the speakers... 
closest to enrollment data", paragraph 42; "top N possible cohort models", paragraph 
51). 

As per Claim 42, the limitations are similar to those in Claim 32, and so is 
rejected under similar rationale. 

As per Claim 43, Chang suggests wherein said initial standard model creating 
unit is operable to generate the correspondence table by creating, or by obtaining from 
an outside source, an initial standard model with a class ID, that is, initial values 
associated with the class ID, or a reference model with a class ID, that is a reference 
model associated with the class ID ("recognition model... acoustic model... customized 
to a user", paragraph 1 ; "parameters for possible cohorts are generated", paragraph 53; 
"modifies the parameters... using the parameters in the... cohort models", paragraph 57; 
where it is known that computers have multiple users, and so it is obvious to store some 
sort of correspondence between the different users and their respective models, and to 
store a label for each different user to know which model to use). 
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As per Claims 44-45, 48, and 53-54, their limitations are similar to those in Claim 
26, and so are rejected under similar rationale. 

As per Claim 46-47, 49, and 55-56, their limitations are similar to those in Claim 
29, and so are rejected under similar rationale. 

As per Claim 50, the limitations are similar to those in Claim 36, and so is 
rejected under similar rationale. 

5. Claims 37-38, and 51 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Chang, in view of Nguyen and Junqua (US 6,253,1 81 ). 

As per Claim 37, Chang teaches a standard model creating apparatus for 
creating a standard model which shows an acoustic characteristic having a specific 
attribute and is used for speech recognition, the standard model creating apparatus 
using a model that expresses a frequency parameter showing an acoustic characteristic 
("recognition model... acoustic model... customized to a user", paragraph 1; 
"frequencies", paragraph 67), the standard model creating apparatus comprising: 

a reference model storing unit operable to store a plurality of reference models 
which are models showing an acoustic characteristic having a specific attribute 
("plurality of different cohort models", paragraph 40; "enrollment data", paragraph 39; 
"selects the speakers... closest to enrollment data", paragraph 42) 
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a reference model preparing unit operable to perform at least one of obtaining a 
reference model from an outside source and storing the obtained reference model in 
said reference model storing unit, and creating a new reference model and storing the 
new reference model in said reference model storing unit ("parameters for possible 
cohorts are generated", paragraph 53; "modifies the parameters... using the parameters 
in the... cohort models", paragraph 57) 

a reference model selecting unit operable to select at least one reference model 
from among the plurality of reference models stored in said reference model storing unit 
based on usage information regarding an attribute which is an object of speech 
recognition, ("plurality of different cohort models", paragraph 40; "enrollment data", 
paragraph 39; "selects the speakers... closest to enrollment data", paragraph 42; "top N 
possible cohort models", paragraph 51; where the speaker's voice qualities are 
attributes that are targeted by speaker dependent recognizers) 

a standard model creating unit operable to create the standard model by 
calculating parameters of the standard model using parameters of the at least one 
reference model selected by said reference model selection unit ("parameters for 
possible cohorts are generated", paragraph 53; "modifies the parameters... using the 
parameters in the... cohort models", paragraph 57) 

wherein said standard model creating unit includes: a standard model structure 
determining unit operable to determine a structure of the standard model which is to be 
created ("parameters for possible cohorts are generated", paragraph 53; "modifies the 
parameters... using the parameters in the... cohort models", paragraph 57) 
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an initial standard model creating unit operable to determine initial values of the 
parameters specifying the standard model whose structure has been determined 
("parameters for possible cohorts are generated", paragraph 53; "modifies the 
parameters... using the parameters in the... cohort models", paragraph 57) 

a parameter estimating unit operable to estimate and calculate the parameters of 
the standard model so as to maximize or locally maximize a probability or likelihood of 
the standard model, whose initial values have been determined, with respect to the at 
least one reference model ("parameters for possible cohorts are generated", paragraph 
53; "modifies the parameters... using the parameters in the... cohort models", paragraph 
57; "likelihood", paragraph 55) 

Chang fails to teach the standard model creating apparatus using a probability 
model that expresses a frequency parameter showing an acoustic characteristic as an 
output probability, where the reference models are probability models, where the 
parameters are statistics. 

Nguyen teaches the standard model creating apparatus using a probability model 
that expresses a frequency parameter showing an acoustic characteristic as an output 
probability, where the reference models are probability models, where the parameters 
are statistics ("training speakers... speaker dependent [SD] models", col. 4, lines 38-52; 
"supervector for each speaker comprises an ordered list of parameters... corresponding 
to at least a portion of the parameters of the Hidden Markov models", col. 4, lines 53-64; 
"new speaker... compute statistics... each sound unit", col. 5, lines 42-56; "HMM... 
observable outputs... transition probabilities", col. 3, lines 17-43; "Gaussian 
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distributions... probability distribution... Gaussian function... parameter-based speech 
modeling", col. 4, lines 3-36; "Alabama female accent", col. 7, lines 5-13; where 
parameters generally include frequency parameters when analyzing speech [speech is 
acoustic]) 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang to include the teaching of Nguyen of the standard 
model creating apparatus using a probability model that expresses a frequency 
parameter showing an acoustic characteristic as an output probability, where the 
reference models are probability models, where the parameters are statistics, in order to 
facilitate quick speaker adaptation of models, as described by Nguyen (col. 1, lines 7- 
10; col. 1, lines 60-66). 

Chang, in view of Nguyen, fail to teach at least one of updating and adding to the 
plurality of reference models stored in said reference model storing unit. 

Junqua teaches at least one of updating and adding to the plurality of reference 
models stored in said reference model storing unit (""adapted speech model", col. 3, 
lines 12-28; "further adapted model", col. 7, lines 42-50; "supplies utterances... performs 
speech recognition... passed by the dialogue system to adaptation system", col. 4, lines 
24-36). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang, in view of Nguyen, to include the teaching of Junqua 
of at least one of updating and adding to the plurality of reference models stored in said 
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reference model storing unit, in order to ensure that the system's models stay consistent 
with any changes to the user, as described by Junqua (col. 7, lines 42-50). 

As per Claim 38, Chang, in view of Nguyen, fail to teach wherein said reference 
model preparing unit is operable to perform at least one of an update and an addition to 
the plurality of reference models stored in said reference model storing unit, based on at 
least one of usage information regarding an object of recognition, and specification 
information regarding specifications of the standard model which is to be created. 

Junqua teaches wherein said reference model preparing unit is operable to 
perform at least one of an update and an addition to the plurality of reference models 
stored in said reference model storing unit, based on at least one of usage information 
regarding an object of recognition, and specification information regarding specifications 
of the standard model which is to be created (""adapted speech model", col. 3, lines 12- 
28; "further adapted model", col. 7, lines 42-50; "supplies utterances... performs speech 
recognition... passed by the dialogue system to adaptation system", col. 4, lines 24-36; 
where the specific user whose model is updated is usage information for the 
corresponding models [objects of recognition]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang, in view of Nguyen, to include the teaching of Junqua 
of wherein said reference model preparing unit is operable to perform at least one of an 
update and an addition to the plurality of reference models stored in said reference 
model storing unit, based on at least one of usage information regarding an object of 
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recognition, and specification information regarding specifications of the standard model 
which is to be created, in order to ensure that the system's models stay consistent with 
any changes to the user, as described by Junqua (col. 7, lines 42-50). 

As per Claim 51 , the limitations are similar to those in Claim 37, and so is 
rejected under similar rationale. 

6. Claim 39 is rejected under 35 U.S.C. 103(a) as being unpatentable over Chang, 
in view of Nguyen and Junqua, as applied to Claim 37, above, and further in view of 
Kanevsky et al. (US 6,442,519), hereafter Kanevsky. 

As per Claim 39, Chang, in view of Nguyen and Junqua, fail to teach a similarity 
information creating unit operable to create, based on the at least one reference model 
stored in said reference model storing unit and at least one of specification information 
regarding specifications of the standard model which is to be created, and usage 
information regarding an attribute which is an object of speech recognition, similarity 
information showing a degree of similarity to the at least one reference model and at 
least one of the usage information and the specification information, wherein said 
reference model preparing unit is operable to determine whether or not to perform at 
least one of an update and an addition to the plurality of reference models stored in said 
reference model storing unit, based on the similarity information created by said 
similarity creating unit ("individual user is clustered with other similar users", col. 3, line 



Application/Control Number: 10/534,869 Page 17 

Art Unit: 2626 

62 - col. 4, line 9; "clustered into classes of similar users according to acoustic 
similarities... cluster update data", col. 7, lines 18-40; "identified similar language 
models are updated", Abstract). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Chang, in view of Nguyen and Junqua, to include the 
teaching of Kanevsky of a similarity information creating unit operable to create, based 
on the at least one reference model stored in said reference model storing unit and at 
least one of specification information regarding specifications of the standard model 
which is to be created, and usage information regarding an attribute which is an object 
of speech recognition, similarity information showing a degree of similarity to the at least 
one reference model and at least one of the usage information and the specification 
information, wherein said reference model preparing unit is operable to determine 
whether or not to perform at least one of an update and an addition to the plurality of 
reference models stored in said reference model storing unit, based on the similarity 
information created by said similarity creating unit, in order to improve speech 
recognition by computers, as described by Kanevsky (col. 3, lines 9-10). 

Conclusion 

7. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See M PEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. 
The examiner can normally be reached on M-F 7:30-4:00. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on 571-272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

EY 6/12/08 
/Patrick N. Edouard/ 

Supervisory Patent Examiner, Art Unit 2626 



